Main analysis (Exploratory Data Analysis)
Our data exploration starts by first looking at variables that might help explain the delay: * Date
* Day of the Week
* Time of Departure
* Distance
* Destination Airport
* Carrier
Naturally, we then proceed to examine whether delays can be ammended for by examining its relationship with the following variables:
* Arrival Delay
* Airline
* Destination Airport
* Destination State
Day of the Month
We initiate our Exploratory Data Analysis by examining the Delays in Departure time for every day of the month. The following scatterplot helps us determine whether a relationship can be inferred.
monthly_data <- mutate(df, month = format(as.Date(df$date,format="%Y-%m-%d"), "%m"),
day = format(as.Date(df$date,format="%Y-%m-%d"), "%d"))
monthly_data <- select(monthly_data, date, dep_delay, month, day)
monthly_data$month[monthly_data$month == 10] <- "October"
monthly_data$month[monthly_data$month == 11] <- "November"
monthly_data$month[monthly_data$month == 12] <- "December"
monthly_data$month = factor(monthly_data$month, levels=c("October","November","December"))
monthly_delay <- filter(monthly_data, df$dep_delay > 0)
monthly_early <- filter(monthly_data, df$dep_delay <= 0)
ggplot() +
geom_point(data = monthly_delay, aes(y = dep_delay, x = day), alpha = .5, color = "#774184", stroke = 0) +
geom_point(data = monthly_early, aes(y = dep_delay, x = day), alpha = .8, color = "#fce640", stroke = 0) +
facet_grid(month~.) +
ylab("Delay in Departure (Minutes)") +
xlab("Days of the Month") +
ggtitle("Departure Delay correlated with Days of the Month",
subtitle = "Limited between 12 hours delay and 2 hours early departure") +
labs(caption = "Source: Final Project Dataset")
Clearly we can observe that during the Christmas holidays, departure delay shows an increase in duration, thus suggesting that passengers spend a lot more time waiting in airports to depart for their final destination. The same conclusion can be derived for the weekend following Thanksgiving (Thursday the 23 of November), which again constitutes an important holiday for the US. People wait a great amount of time to return to their homes on Sunday instead of leaving a day earlier when delays are reduced. Finally, we can observe that during certain days of the week, like Tuesdays or Wednesdays, we have reduced Departure Delay times. This is expected since the main reasons for travelling during the week are business related.
Day of the Week
How are flights distributed throughout a week? Which day is most popular for travelling?
num_flights <- df %>% group_by(weekday) %>% summarize(count = n())
num_flights$perc <- num_flights$count / sum(num_flights$count)
num_flights$weekday <- factor(num_flights$weekday, levels = c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'))
ggplot(num_flights, aes(weekday, perc)) +
geom_col(fill = '#774184' ) +
ggtitle('Percentage of Flights on Each Day') +
xlab("Day of the Week") +
ylab("Percentage of Flights") +
labs(caption = "Source: Final Project Dataset")
Flights seem to be evenly distributed over all days except Saturdays. This somewhat makes sense, as travellers with flights on Saturdays are more likely to be on vacations - and they are more likely to depart on Friday for their destination. How do delays relate to days of the week?
df$weekday <- factor(df$weekday, levels = c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'))
ggplot(df, aes(x=weekday, y=dep_delay)) + geom_boxplot(varwidth=TRUE, fill = '#774184') +
ggtitle('Distribution of Delay on Each Day') +
xlab("") +
ylab("Delay in Minutes") +
labs(caption = "Source: Final Project Dataset")
The range of our dataset makes it hard to visualize the distribution effectively and identify any patterns. One way of addressing this is taking the log of our y-axis. Of course, logging negative values would result in erroneous results so we’ll focus on positive delays for now.
ggplot(subset(df, df$dep_delay >0), aes(x=weekday, y=log(dep_delay))) + geom_boxplot(varwidth=TRUE, fill = '#774184') +
ggtitle('Distribution of Positive Delay on Each Day') +
xlab("") +
ylab("log(Delay in Minutes)") +
labs(caption = "Source: Final Project Dataset")
Evidently, delays are very similar over each day so there’s no significant observation to be made. Can we learn anything by looking at negative delays?
df$pos_delay <- ifelse(df$dep_delay > 0, 'Positive', 'Negative')
ggplot(df, aes(x=weekday, y=log(abs(dep_delay)))) + geom_boxplot(aes(fill=factor(pos_delay)), varwidth=TRUE) +
scale_y_continuous(name = 'log(Delay in Minutes)') + xlab('Day of the Week') + guides(fill=guide_legend(title="Delay")) +
scale_fill_manual(values=c('#4E8441', '#774184')) +
labs(caption = "Source: Final Project Dataset")
More flights depart early than late, but positive delays have greater range. Early departures also don’t seem to vary by day.
Distance
Does distance have any specific characteristics and how does the Distance relate to the Delays in Departure?
ggplot(df_backup, aes(x = dep_delay, y = distance)) +
geom_point(alpha = .3, color = "#441152", stroke = 0) +
xlab("Delays in Departure (Minutes)") +
ylab("Distance (Miles)") +
ggtitle("Departure Delays in correlation with the travelled Distance") +
labs(caption = "Source: Final Project Dataset")
The scatterplot of distance in correlation with departure delay helps us determine the following:
* Different values of delay recorded for a specific distance seem to form a line.
* There exist specific intervals of time within a day that delays accumulate and these should be our main focus.
To further investigate the latter claim, we produce the following two graphs:
ggplot(df_backup, aes(x = dep_delay, y = distance)) +
geom_point(alpha = .3, color = "#4E8441", stroke = 0) +
xlim(-720,720) +
xlab("Delays in Departure (Minutes)") +
ylab("Distance (Miles)") +
ggtitle("Departure Delays in correlation with the travelled Distance",
subtitle = "Limited in 12 hours delays - early departures") +
labs(caption = "Source: Final Project Dataset")
ggplot(df_backup, aes(x = dep_delay, y = distance)) +
geom_point(alpha = .3, color = "#441152", stroke = 0) +
xlim(-180,180) +
xlab("Delays in Departure (Minutes)") +
ylab("Distance (Miles)") +
ggtitle("Departure Delays in correlation with the travelled Distance",
subtitle = "Limited in 3 hours delays - early departures") +
labs(caption = "Source: Final Project Dataset")
Now, we can clearly bound the times of the departure delays that we are going to examine between a maximum of 12 hours delay and a maximum of 2 hours early departure. This follows from the fact that a 3 hour delay or early departure interval (third graph) does not constitute a clear representation of the phenomenon that we are exploring. On the contrary, a 12 hours delay interval is more than enough to give us the full picture. When combined with the 2 hours early departure barrier we have a clear picture of the situation at hand.
Upon further investigation of the distance variable, we produced a plot of the unique values of distances traveled to reach each state.
states_recode <- c("1" = "AK", "2" = "HI", "3" = "PR", "4" = "VI", "5" = "TT", "11" = "CT", "12" = "ME", "13" = "MA", "14" = "NH", "15" = "RI", "16" = "VT", "21" = "NJ", "22" = "NY", "23" = "PA", "33" = "FL", "34" = "GA", "35" = "MD", "36" = "NC", "37" = "SC", "38" = "VA", "39" = "WV", "41" = "IL", "42" = "IN", "43" = "MI", "44" = "OH", "45" = "WI", "51" = "AL", "52" = "KY", "53" = "MS", "54" = "TN", "61" = "IA", "62" = "KS", "63" = "MN", "64" = "MO", "65" = "NE", "66" = "ND", "67" = "SD", "71" = "AR", "72" = "LA", "73" = "OK", "74" = "TX", "81" = "AZ", "82" = "CO", "83" = "ID", "84" = "MT", "85" = "NV", "86" = "NM", "87" = "UT", "88" = "WY", "91" = "CA", "92" = "OR", "93" = "WA")
distance_state_data <- df %>% select( dest_state, distance ) %>% group_by( dest_state, distance ) %>% summarize( count = n() )
distance_state_data <- distance_state_data %>% mutate( state = states_recode[as.character(dest_state)])
ggplot(distance_state_data, aes(x = state, y = distance)) +
geom_point(alpha = .8, color = "#441152", stroke = 0) +
xlab("Destination (State)") +
ylab("Distance (Miles)") +
theme(axis.text.x = element_text(size = 5)) +
ggtitle("Distance in correlation with the State Destination",
subtitle = "Unique Distances traveled to reach corresponding State destination") +
labs(caption = "Source: Final Project Dataset")
Additionally, we create a graph of the unique distances that each US carrier claims to have covered.
distance_carrier_data <- df %>% select( carrier, distance ) %>% group_by( carrier, distance ) %>% summarize( count = n() )
ggplot(distance_carrier_data, aes(x = carrier, y = distance)) +
geom_point(alpha = .8, color = "#4E8441", stroke = 0) +
coord_flip() +
xlab("US Carrier") +
ylab("Distance (Miles)") +
ggtitle("Distance in correlation with the US carrier") +
labs(caption = "Source: Final Project Dataset")
Based on the two graphs presented, we were able to observe that values of the distance variable are more discrete than they were expected to be. Logically speaking, we should have a greater variety of values since it is extremely rare for two planes to cover the exact same distance going back and forth to the same destination. By going back to the source of our data, we were able to discover that distances provided by our dataset are approximations of the shortest distance between any two points on the surface of a sphere. Therefore, any two flights with the same start and end points have the same distances. This can be verified from the table below.
distance_table <- df %>% select(origin, dest, dest_state, distance) %>% group_by(origin, dest)
head(distance_table, 10)
## # A tibble: 10 x 4
## # Groups: origin, dest [10]
## origin dest dest_state distance
## <int> <int> <int> <int>
## 1 10140 14107 81 328
## 2 14107 10140 86 328
## 3 13930 14108 41 130
## 4 14108 13930 41 130
## 5 12217 13930 41 510
## 6 13930 12217 51 510
## 7 10397 13930 41 606
## 8 13930 10397 34 606
## 9 11823 13930 41 157
## 10 13930 11823 42 157
Destination Airport
df$dest %>% unique() %>% length() #How many airports are in the data?
## [1] 300
To begin exploring the data on destination, with 300 unique values, we first construct a dataframe grouped by destination and with features for number of flights, total departure delay and mean departure delay.
df_n <- df %>% group_by(dest) %>%
summarise(n = n(),
total_delay = sum(dep_delay),
avg_delay = total_delay / n) %>%
arrange(desc(n))
With the dataframe in hand, we will look at destinations with the most flights, more than 20 000 over the period.
df_n %>% filter(n > 20000) %>% ggplot() +
geom_point(aes(reorder(dest, avg_delay), avg_delay, size=n, fill=avg_delay), shape=21, alpha=.7) +
coord_flip() + scale_fill_viridis_c() +
xlab('destination') +
labs(caption = "Source: Final Project Dataset")
11618 Newark Liberty International stands out as having the highest average delay of the lot. 14771 San Francisco International has a smaller average delay than Newark, but it stands out among those airports with more than 40 000 flights. Apart from San Francisco these airports have average delays in the neightborhood of five minutes. It is also noticeable that none of the most frequent destinations have a negative average delay.
df_n %>% filter(avg_delay < 0) %>% ggplot() +
geom_point(aes(reorder(dest, avg_delay), avg_delay, size=n, fill=avg_delay), shape=21, alpha=.7) +
coord_flip() + scale_fill_viridis_c() +
xlab('destination') +
labs(caption = "Source: Final Project Dataset")
Here we looked at those destinations with negative average delays. According to our size scale, the frequency of flights to these airports is more than an order of magnitude lower than in the previous plot. Two airports stand out in terms of average delay, 10165 Adak Island, Alaska and 10754 Barrow, Alaska.
df_n %>% filter(avg_delay < 1 & avg_delay > -1) %>% ggplot() +
geom_point(aes(reorder(dest, avg_delay), avg_delay, size=n, fill=avg_delay), shape=21, alpha=.7) +
coord_flip() + scale_fill_viridis_c() +
xlab('destination') +
labs(caption = "Source: Final Project Dataset")
Among those airports with average delays within one minute of zero, the number of flights also tends to be small, no more than 3 000.
df_n %>% filter(avg_delay > 10) %>% ggplot() +
geom_point(aes(reorder(dest, avg_delay), avg_delay, size=n, fill=avg_delay), shape=21, alpha=.7) +
coord_flip() + scale_fill_viridis_c() +
xlab('destination') +
labs(caption = "Source: Final Project Dataset")
When we filter to those airports with average delays greater than ten minutes, airport 11618 Newark stand out as by far the most popular destination in this bracket. As with the negative delay visualization, there are a small number of outliers in terms of average delay:
* 13388 Mammoth Lakes, California
* 15295 Toledo, OH
* 13964 North Bend/Coos Bay, Oregon
* 10372 Aspen, Colorado
Mammoth Lakes and Aspen are both ski resort towns so it might be interesting to look at the average delays in the summer to see if there is a reversion to the mean during warmer months.
top_airports <- df %>% group_by(dest) %>% summarize(num_flights = n())
top_airports <- top_airports[order(-top_airports$num_flights), ]
top_airports <- head(top_airports, 5)
last_airports <- df %>% group_by(dest) %>% summarize(num_flights = n())
last_airports <- subset(last_airports, last_airports$num_flights > 100)
last_airports <- last_airports[order(last_airports$num_flights), ]
last_airports <- head(last_airports, 5)
df$pop_dest <- ifelse(df$dest %in% top_airports$dest, "Popular", 0)
df$pop_dest <- ifelse(df$dest %in% last_airports$dest, "Unpopular", df$pop_dest)
head(subset(df, df$pop_dest != 0), 5)
## # A tibble: 5 x 18
## date carrier flight_num origin dest dest_state sched_dep
## <date> <chr> <int> <int> <int> <int> <dttm>
## 1 2017-10-01 OO 2931 14108 13930 41 2017-10-01 18:10:00
## 2 2017-10-01 OO 2932 12217 13930 41 2017-10-01 17:21:00
## 3 2017-10-01 OO 2933 10397 13930 41 2017-10-01 19:03:00
## 4 2017-10-01 OO 2933 13930 10397 34 2017-10-01 15:20:00
## 5 2017-10-01 OO 2936 11823 13930 41 2017-10-01 11:13:00
## # ... with 11 more variables: dep <dttm>, sched_arr <dttm>, arr <dttm>,
## # cancelled <int>, canc_code <chr>, distance <int>, weekday <fct>,
## # dep_delay <int>, arr_delay <int>, pos_delay <chr>, pop_dest <chr>
We can also visualize the distribution of departure delay at our top airports in order to see if flights are more likely to be delayed when headed to more popular destinations.
ggplot(subset(df, df$pop_dest != 0), aes(x=factor(dest), y=log(dep_delay))) + geom_boxplot(varwidth=TRUE, fill = '#774184') +
ggtitle('Distribution of Positive Delay in the busiest and least busiest Airports') +
facet_grid(. ~ pop_dest) +
xlab("Airport") +
ylab("log(Delay in Minutes)") +
labs(caption = "Source: Final Project Dataset")
The delay distribution varies very little among the top 5 airports. The delay distribution between popular and unpopular airports does not differ as much as we hoped, so there’s no insight to be made here.
Carrier
Are certain airlines more likely to experience delays?
top_carriers <- df %>% group_by(carrier) %>% summarize(num_flights = n())
top_carriers <- top_carriers[order(-top_carriers$num_flights), ]
top_carriers <- head(top_carriers, 5)
last_carriers <- df %>% group_by(carrier) %>% summarize(num_flights = n())
last_carriers <- subset(last_carriers, last_carriers$num_flights > 100)
last_carriers <- last_carriers[order(last_carriers$num_flights), ]
last_carriers <- head(last_carriers, 5)
df$pop_carrier <- ifelse(df$carrier %in% top_carriers$carrier, "Busy", 0)
df$pop_carrier <- ifelse(df$carrier %in% last_carriers$carrier, "Not Busy", df$pop_carrier)
head(top_carriers)
## # A tibble: 5 x 2
## carrier num_flights
## <chr> <int>
## 1 WN 323499
## 2 DL 221833
## 3 AA 215292
## 4 OO 180196
## 5 UA 146044
The five most popular airlines are:
* WN: Southwest Airlines
* DL: Delta Airlines
* AA: American Airlines
* OO: Skywest Airlines
* UA: United Airlines
head(last_carriers)
## # A tibble: 5 x 2
## carrier num_flights
## <chr> <int>
## 1 VX 18077
## 2 HA 19826
## 3 F9 26981
## 4 NK 37966
## 5 AS 43910
The five least popular airlines are:
* VX: Virgin America
* HA: Hawaiian Airlines
* F9: Frontier Airlines
* NK: Spirit Airlines
* AS: Alaska Airlines
A reminder that this is only in regards to domestic flights departing New York.
ggplot(subset(df, df$pop_carrier != 0), aes(x=factor(carrier), y=log(dep_delay))) + geom_boxplot(varwidth=TRUE, fill = '#774184') +
ggtitle('Distribution of Positive Delay in the most and least frequent Airlines') +
facet_grid(. ~ pop_carrier) +
xlab("Airline") +
ylab("log(Delay in Minutes)") +
labs(caption = "Source: Final Project Dataset")
The departure delay does not seem to vary as much, implying that the frequency of an airline does not play a factor on departure delay. Upon closer inspection, Skywest is the airline with the highest median delay whereas Hawaiaan has the lowest. Southwest is the most popular out of the ten, with more than 3500 flights out of New York State every day. It is also the best performing airline out of the “Busy” group.
Judging from the plots presented so far, departure delays are more likely to be caused by factors like the time of departure and the day of the week rather than the popularity of the airline or the flight’s destination. However in the event of a delay prior to departure, is it a sign that the arrival delay is going to be even larger or will the pilot make-up for the delay by getting to the destination faster? Are long flights more likely to have a high delay make-up (difference between departure and arrival delay) or do they exacerbate the situation further?
Arrival Delay
How do departure delay and arrival delay relate? Is it possible that pilots make up for delays with a shorter flight, or is it a sign that the flight is going to take even longer?
ggplot(df, aes(dep_delay, arr_delay)) +
geom_hex(bins = 50) +
scale_fill_gradientn(colours = viridis(5, direction = 1)) +
xlab('Departure Delay in Minutes') +
ylab('Arrival Delay in Minutes') +
guides(fill=guide_legend(title="Frequency")) +
labs(caption = "Source: Final Project Dataset")
The two variables appear to have a roughly linear relationship, especially at higher values. There are exceptions to this, where a large number of flights experience high arrival delay even though they had zero departure delay. This can be explained by flights that had trouble landing because destination airport landing strips were busy.
We can construct a variable called diff_delay that measures the difference between departure delay and arrival delay. This will allow us to examine whether a departure delay is likely to be amended for.
df$diff_delay <- df$dep_delay - df$arr_delay
Airline
df$pos_makeup <- ifelse(df$diff_delay >= 0, "Positive", "Negative")
df$pos_makeup <- factor(df$pos_makeup)
ggplot(subset(df, df$pop_carrier != 0 & df$pos_makeup == 'Positive'), aes(x=factor(carrier), y=log(dep_delay))) + geom_boxplot(varwidth=TRUE, fill = '#774184') +
ggtitle('Distribution of Positive Makeup Delay in the busiest and least busy Airlines') +
facet_grid(. ~ pop_carrier) +
xlab("Airline") +
ylab("log(Positive Makeup Delay in Minutes)") +
labs(caption = "Source: Final Project Dataset")
Apparently, the size of an airline doesn’t have an impact on its ability to make up for a delay.
Destination State
We wish to explore any patterns relating departure delay and arrival delay. Does the distance between New York and the destination state influence a flight’s performance? What about the destination state’s population density?
# data frame for the cleveland plot and map
df$diff <- df$dep_delay - df$arr_delay
flight <- df
flight$dest_state <- factor(flight$dest_state)
region_code <- c("1", "2", "3", "4", "5", "11", "12", "13", "14", "15", "16", "21", "22", "23", "31", "33", "34", "35", "36", "37", "38", "39", "41", "42", "43", "44", "45", "51", "52", "53", "54", "61", "62", "63", "64", "65", "66", "67", "71", "72", "73", "74", "81", "82", "83", "84", "85", "86", "87", "88", "91", "92", "93", "31")
region_name <- c("Alaska", "Hawaii", "Puerto Rico", "U.S. Virgin Islands", "U.S. Pacific Trust Territories and Possessions", "Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont", "New Jersey", "New York", "Pennsylvania", "Delware", "Florida", "Georgia", "Maryland", "North Carolina", "South Carolina", "Virginia", "West Virginia", "Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin", "Alabama", "Kentucky", "Mississippi", "Tennessee", "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", "South Dakota", "Arkansas", "Louisiana", "Oklahoma", "Texas", "Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", "Utah", "Wyoming", "California", "Oregon", "Washington", "Delaware")
flight$dest_state <- factor(flight$dest_state, region_code, region_name)
# tidy data for clevelad plot and map
flight_state <- flight %>%
group_by(dest_state) %>%
dplyr::summarize(sum_n = n(), sum_arr = sum(arr_delay), sum_dep = sum(dep_delay), sum_diff = sum(diff)) %>%
mutate(mean_arr = sum_arr/sum_n, mean_dep = sum_dep/sum_n, mean_diff = sum_diff/sum_n)
# cleveland plot of arrival time and depature time
flight_state$dest_state <- reorder(flight_state$dest_state, flight_state$mean_arr)
ggplot(flight_state) +
geom_point(aes(mean_arr, dest_state, color = 'mean_arr'), size = 3) +
geom_point(aes(mean_dep, dest_state, color = 'mean_dep'), size = 3) +
labs(title = "Arrival and Departure Delay over States", x = "Average Delay in Mintutes", y = "", color = '') +
theme(legend.position = 'bottom') +
scale_color_manual(name = '', values = c('mean_arr'= '#774184', 'mean_dep'='#4E8441'), labels=c("Arrival Delay", "Departure Delay"))
# generate arrival map format for choropleth
arr_map_form <- subset(flight_state, select = c('dest_state', 'mean_arr'))
names(arr_map_form)[1] <- paste('region')
names(arr_map_form)[2] <- paste('value')
arr_map_form$region <- as.character(arr_map_form$region)
arr_map_form$region <- tolower(arr_map_form$region)
# generate arrival map
arr_map <- StateChoropleth$new(arr_map_form)
arr_map$title = "Arrival Delay over States"
arr_map$show_labels = FALSE
arr_map$set_num_colors(1)
arr_map$ggplot_scale = scale_fill_viridis(name = 'Average Arrival Delay in Minutes')
arr_map$render()
Neither departure nor arrival delay are strongly associated with destination state. The five regions with highest arrival delays are New Jersey, Puerto Rico, the U.S Virgin Islands, Hawaii and New York. The five regions with lowest arrival delays are Virginia, Maryland, Utah, South Carolina and the U.S Pacific Trust Territories and Possessions.
We surprisingly found the average departure delay is higher than the average arrival delay in all states, no matter their location or population density. This implies that pilots can make up time during flights after delayed departures.
# cleveland plot of time difference
flight_state$dest_state <- reorder(flight_state$dest_state, flight_state$mean_diff)
ggplot(flight_state) +
geom_point(aes(mean_diff, dest_state), color = '#774184', size = 3) +
labs(title = "Difference between Departure and Arrival Delay over States", x = "Average Difference in Mintutes", y = "", color = '')
# generate diff map format for choropleth
diff_map_form <- subset(flight_state, select = c('dest_state', 'mean_diff'))
names(diff_map_form)[1] <- paste('region')
names(diff_map_form)[2] <- paste('value')
diff_map_form$region <- as.character(diff_map_form$region)
diff_map_form$region <- tolower(diff_map_form$region)
# generate diff map
diff_map <- StateChoropleth$new(diff_map_form)
diff_map$title = "Difference between Departure and Arrival Delay over States"
diff_map$show_labels = FALSE
diff_map$set_num_colors(1)
diff_map$ggplot_scale = scale_fill_viridis(name = 'Average Difference in Minutes')
diff_map$render()
To our surprise, flights to the Northwestern states (Washington, Oregon, Idaho, Montana, Wyoming, North Dakota and South Dakota) had a lower delay difference than did others. This contradicts our hypothesis that pilots can make up for time lost during longer flights. There doesn’t appear to be a relationship between destination and the difference between arrival and departure delay. In addition to the Northwest states, Alaska, Hawaii, U.S. Virgin Islands and Mississippi also have a low make-up delay.